Include the names of your collaborators here.
This homework is dedicated to working with non-Bayesian
regularization. You will fit your own ridge and lasso penalized
regression models in order to gain experience with interpreting
elastic net model results. You will also practice
tuning a penalized regression model by finding the
best regularization strength, \(\lambda\), value using a train/test split.
You will then use the glmnet package and caret
to tune elastic net models in order to identify the most important
features in a model by turning off unimportant features.
This assignment will provide practice working with realistic modeling situations such as large numbers of inputs, examining the influence of correlated features, working with interactions, and working with categorical inputs.
If you do not have glmnet or caret
installed please download both before starting the assignment.
You will also work with the corrplot package in this
assignment. You must download and install corrplot before
starting the assignment.
IMPORTANT: The RMarkdown assumes you have downloaded the data sets (CSV files) to the same directory you saved the template Rmarkdown file. If you do not have the CSV files in the correct location, the data will not be loaded correctly.
Certain code chunks are created for you. Each code chunk has
eval=FALSE set in the chunk options. You
MUST change it to be eval=TRUE in order
for the code chunks to be evaluated when rendering the document.
You are free to add more code chunks if you would like.
This assignment will use packages from the tidyverse
suite.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
This assignment also uses the broom package. The
broom package is part of tidyverse and so you
have it installed already. The broom package will be used
via the :: operator later in the assignment and so you do
not need to load it directly.
The glmnet package will be loaded later in the
assignment. As stated previously, please download and install
glmnet if you do not currently have it.
You will gain experience with non-Bayesian Ridge and Lasso regression by working with a large number of inputs in this problem. The training data are loaded for you below.
dfA_train <- readr::read_csv('hw09_probA_train.csv', col_names = TRUE)
## Rows: 96 Columns: 76
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (76): x01, x02, x03, x04, x05, x06, x07, x08, x09, x10, x11, x12, x13, x...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
All input variables have names that start with x and the
output is named y. The glimpse provided below shows there
are many input columns!
dfA_train %>% glimpse()
## Rows: 96
## Columns: 76
## $ x01 <dbl> -2.06943319, -1.29642012, 0.91302407, 0.53847200, 1.58258670, -2.3…
## $ x02 <dbl> -0.69220535, -0.74646173, 0.50415529, -2.24983365, -0.06869533, -1…
## $ x03 <dbl> -1.01314640, 0.46711663, 0.35073648, -0.26764563, -0.06524989, 0.8…
## $ x04 <dbl> 1.3310201, -1.1901216, 0.4567449, 1.9486926, 1.1343209, -0.5150867…
## $ x05 <dbl> 0.2662435, -0.2801269, 0.5267086, -1.2564140, 1.5565706, -0.237865…
## $ x06 <dbl> -0.72471702, -0.09158292, -0.70865959, 1.07960684, 0.72399642, -0.…
## $ x07 <dbl> 0.119003559, 1.794130638, -0.131307071, -0.634631730, 0.791042823,…
## $ x08 <dbl> -0.21240839, 1.43690253, 0.37810170, -0.95218224, 0.02586663, -0.9…
## $ x09 <dbl> 0.86530394, -0.13721049, -0.87982548, -1.81113831, -0.60337380, -1…
## $ x10 <dbl> -0.67479792, 0.49138297, -0.40283973, 1.21654838, 1.37487525, -2.9…
## $ x11 <dbl> 0.4768842, 1.3187331, -0.2375626, -0.8417482, -0.2555588, -0.73944…
## $ x12 <dbl> -0.55028261, -0.57696349, 0.34973240, -0.31482804, -1.32927922, 1.…
## $ x13 <dbl> 1.14678993, -0.03428652, -0.36594800, -0.19693086, 2.64028389, -0.…
## $ x14 <dbl> 0.62387485, -0.25243935, -0.23542049, -1.60157449, 1.17348036, -1.…
## $ x15 <dbl> -0.11340307, 0.62118450, -0.07521731, 0.22553162, -0.03979362, -0.…
## $ x16 <dbl> -1.04538727, -1.65054917, 1.35264110, -0.74933493, -0.20964781, -0…
## $ x17 <dbl> 0.47243145, 0.05101400, -0.61833554, -0.03849856, -0.47878135, 0.8…
## $ x18 <dbl> -0.590890777, 1.169765794, -0.059145998, -0.309537577, 0.842366000…
## $ x19 <dbl> 0.53364049, 2.33425890, -0.47852405, 0.54132470, 0.60261169, -2.02…
## $ x20 <dbl> -0.05844756, -0.74187288, 0.95175354, -0.20784247, 1.13489628, -2.…
## $ x21 <dbl> 0.65525513, 1.68884588, -0.57038117, -1.80555409, 2.08148634, 0.53…
## $ x22 <dbl> -1.20221003, 1.74128942, -0.74659747, 0.87619987, -0.10779368, -0.…
## $ x23 <dbl> 0.87133442, -0.14821616, 2.14691210, -0.19088522, -0.29948310, 0.3…
## $ x24 <dbl> 0.87836580, 1.12460649, 0.22236027, 0.20390661, 0.72825984, -0.163…
## $ x25 <dbl> -0.64288824, -0.06167637, -0.34527940, 0.62755689, -0.29965224, -0…
## $ x26 <dbl> -0.43031692, -0.72294116, -0.02082838, -0.19986085, 0.94931216, -1…
## $ x27 <dbl> -0.549731535, 0.104640571, -0.005052764, -1.555809942, 0.800257477…
## $ x28 <dbl> -0.37636467, 1.05012853, 1.09066971, -1.73283373, -0.52552393, -0.…
## $ x29 <dbl> 1.29310734, 1.13690150, -0.32233373, -0.31890823, 0.40454183, -0.0…
## $ x30 <dbl> 1.60799472, -0.34339053, -1.94283061, -0.61647080, 0.74117929, 1.0…
## $ x31 <dbl> -1.8940459, -1.3444175, 0.4917738, 1.2478144, 0.7653535, 0.4159829…
## $ x32 <dbl> 1.69469353, 0.17893691, 0.08081505, -1.63310270, 0.75915432, 0.541…
## $ x33 <dbl> -1.55078952, -0.15141527, -0.44858391, 0.21312202, 0.63037564, 0.7…
## $ x34 <dbl> 2.07109783, 0.95276628, 1.32599060, 2.27539009, -0.12636968, 0.247…
## $ x35 <dbl> 1.82290334, -0.31907075, -1.16353613, 0.85407328, -1.28814754, 0.6…
## $ x36 <dbl> -0.7394679, -1.7322869, 1.0537616, 0.1366273, -0.4719353, 0.413739…
## $ x37 <dbl> 0.93508239, -0.79537446, -1.87254115, 0.11414416, 0.05009938, -0.1…
## $ x38 <dbl> -0.507117532, -0.001069243, -0.636333478, 0.419641050, -0.47657448…
## $ x39 <dbl> 0.22888128, 1.31815563, 0.42261732, 0.15486481, -0.94886750, 0.802…
## $ x40 <dbl> -1.53550785, 0.08945169, 0.10561738, 1.40392237, 3.16451628, -1.59…
## $ x41 <dbl> 0.54995962, -1.49673970, 0.45267944, 0.07628427, -0.55596341, -0.7…
## $ x42 <dbl> -1.8288926, 1.7451062, 0.8158462, -0.1390173, -0.8047000, -0.11506…
## $ x43 <dbl> 1.90648214, 0.67113809, -0.05362633, 0.39563373, 2.67730180, 0.922…
## $ x44 <dbl> 1.29447870, -1.68009812, -0.32558008, -1.59679394, -0.49488477, -0…
## $ x45 <dbl> 0.95338061, -0.99034220, -0.98995377, 0.26891339, 0.12166661, -0.9…
## $ x46 <dbl> -0.13483662, -0.50696860, -0.51738788, -2.06234042, -0.10680057, -…
## $ x47 <dbl> 2.43187944, 0.37847403, -0.06741761, 0.59418109, -0.22940658, -0.7…
## $ x48 <dbl> -0.9565450, 0.4803113, -0.3920700, -0.4585714, -1.9929885, 0.88366…
## $ x49 <dbl> 0.94853247, 0.26362181, -0.16464646, 0.11935923, -0.32447324, 0.04…
## $ x50 <dbl> 0.19018260, 1.13505566, -0.16972687, -1.50584636, -0.07841405, -1.…
## $ x51 <dbl> -0.6831468, 0.1100079, 0.7239210, 0.2311925, -0.7642252, 0.1093128…
## $ x52 <dbl> 0.65307722, -0.80046181, 0.86927368, 1.59861934, -0.47053886, -0.0…
## $ x53 <dbl> -0.95918072, 0.09582336, 0.05805066, -1.65059194, 0.52212944, -0.8…
## $ x54 <dbl> -0.41506279, 0.53415776, -1.08420393, 0.31151571, 1.11321863, 2.20…
## $ x55 <dbl> -0.9469963, -0.9124609, -1.4035349, -0.3556957, 0.7224917, -0.1489…
## $ x56 <dbl> 0.54883719, -1.91727346, 0.18654617, -0.75041670, -1.04649763, 0.3…
## $ x57 <dbl> -0.72422453, 1.43064413, -1.47964130, 0.57516981, 0.49719144, 0.58…
## $ x58 <dbl> -2.20950208, -0.23497601, 1.07557427, 0.24929374, 0.56432730, -0.2…
## $ x59 <dbl> -0.8485211, -0.5308077, 1.5559665, -1.6331253, -0.2573712, -0.2842…
## $ x60 <dbl> -2.52402189, 1.29349906, -0.20218647, 1.87527348, -0.87476342, -0.…
## $ x61 <dbl> -0.66892027, -1.72557732, 0.79925161, 1.80655230, 0.22016194, -2.3…
## $ x62 <dbl> -0.92960688, 0.04079371, 0.33876203, 0.72753901, -1.08300120, 0.71…
## $ x63 <dbl> -0.42021184, 2.03151546, -0.21235170, -0.54181759, 1.27431677, 0.1…
## $ x64 <dbl> -0.68218725, -0.13237431, 0.66637214, 0.63351491, 0.70331183, -1.2…
## $ x65 <dbl> -1.853949165, -0.006625952, -1.239688043, 1.724372575, -0.53278079…
## $ x66 <dbl> 0.758622374, -0.247157562, -1.170425464, 1.188988247, 0.423596811,…
## $ x67 <dbl> 1.506210427, 0.085497740, -1.698973185, 1.076071882, -0.459228447,…
## $ x68 <dbl> 0.32879336, 0.80760027, -2.87189688, 0.15125671, 0.24641242, 0.857…
## $ x69 <dbl> -0.22660315, -0.86527369, -0.37419920, 1.28663788, -0.17221500, -1…
## $ x70 <dbl> 0.15206087, 0.50449311, 1.24957901, -3.66360118, 0.46417993, -1.09…
## $ x71 <dbl> 1.10750748, -1.51821411, -1.04662914, 0.91478065, 0.31102705, -1.2…
## $ x72 <dbl> -0.46295847, 0.75343551, -1.12696912, -0.84281597, -1.22013393, 0.…
## $ x73 <dbl> -0.74889700, -1.29222023, -2.58930724, -1.74360767, -1.09158189, 0…
## $ x74 <dbl> 1.48771931, -0.56530125, -1.56187174, -1.10624410, -1.65851229, -0…
## $ x75 <dbl> -1.22958786, 0.57706123, 0.38631089, 1.37850476, -1.21528450, -2.7…
## $ y <dbl> -5.033324, -8.874837, 1.720462, 16.535303, 22.683417, -3.642452, -…
The data are reshaped for you into long-format. The output
y is preserved and all input columns are “stacked” on top
of each other. The name column provides the input’s name
and the value column is the value of the input. The
input_id column is created for you. It stores the input ID
as an integer instead of a string. The stringr package is
part of tidyverse.
lfA_train <- dfA_train %>%
tibble::rowid_to_column() %>%
pivot_longer(starts_with('x')) %>%
mutate(input_id = as.numeric(stringr::str_extract(name, '\\d+')))
It is always important to explore data before modeling. You will use regularized regression models in this assignment. Regularized regression requires that all features have roughly the same scale in order for the regularization strength to have the same influence.
Use a boxplot to check if all inputs have roughly the same
scale. Pipe the long-format data, lfA_train to
ggplot() and map the input_id to the
x aesthetic and the value to the
y aesthetic. Add the geom_boxplot() layer and
map the input_id to the group
aesthetic.
How do the scales compare across the inputs?
HINT: Remember the importance of the aes()
function!
Add your code chunk here.
lfA_train %>% ggplot(mapping = aes(x = input_id, y = value)) +
geom_boxplot(mapping = aes(group = input_id))
In the figure, each box represents an input, and the range of box lines represents the scales. Based on the observation we can find that the scales have a small difference.
Let’s now visualize the relationship between the output,
y, and each input. Each input will be associated with a
facet, but you will still need to show several figures because there are
so many inputs in this problem.
Create a scatter plot to show the relationship between the output and each input. Use facets for each input.
In the first code chunk, pipe the long-format data to
filter() and keep all rows where input_id is
less than 26. Pipe the result to ggplot() and map
value to the x aesthetic and map
y to the y aesthetic. Add the appropriate geom
to make the scatter plot and add a facet_wrap() layer with
the facets “as a function” of name.
In the second code chunk, pipe the long-format data to
filter() and keep all rows where input_id is
between 26 and 50. Use the same aesthetics and facet structure as the
first code chunk.
In the third code chunk, pipe the long-format data to
filter() and keep all rows where input_id is
greater than 50. Use the same aesthetics and facet structure as the
first code chunk.
Set the alpha to 0.33 in all three scatter
plots.
HINT: The between() function can help you
here…
### the first 25 inputs
lfA_train %>%
filter(input_id < 26) %>%
ggplot(aes(x = value, y = y, alpha = 0.33)) +
geom_point() +
facet_wrap(~ name, nrow = 5)
### inputs between 26 and 50
lfA_train %>%
filter(between(input_id, 26, 50)) %>%
ggplot(aes(x = value, y = y, alpha = 0.33)) +
geom_point() +
facet_wrap(~ name, nrow = 5)
### last 25 inputs
lfA_train %>%
filter(input_id > 50) %>%
ggplot(aes(x = value, y = y, alpha = 0.33)) +
geom_point() +
facet_wrap(~ name, nrow = 5)
Based your previous figures, which inputs seem related to the output?
What do you think?
Based on my observations, I think x03, x25, x66, x71 and x75 looks like to be related to the output.
Let’s fit a non-Bayesian linear model by maximizing the likelihood.
You do not have program the model from scratch. Instead, you are allowed
to use lm() to fit the model.
Fit a linear model by maximizing the likelihood using linear additive features for all inputs.
You must use the original wide-format data,
dfA_train.
Assign the model to the modA
object.
Add your code chunk here.
modA <- lm(y ~ ., data = dfA_train)
Use the coefplot::coefplot() function to
visualize the coefficient summaries for your linear additive features
model.
Add your code chunk here.
library(coefplot)
coefplot(modA)
There are other ways to examine the coefficient summaries besides the
coefficient plot visualization. The broom package has
multiple functions which can help interpret model behavior. The
broom package organizes model results into easy to
manipulate TIDY dataframes. You will use the broom::tidy()
function to examine the coefficient summaries. Applying the
broom::tidy() function to an lm() model object
produces a dataframe. One row is one coefficient. The term
column is the name of feature the coefficient is applied to and the
other columns provide different statistics about the coefficient. You
can use any other tidyverse function, such as
select(), filter(), count(), and
others to manipulate the broom::tidy() returned
dataframe!
Non-Bayesians focus on the p-value (p.value column from
broom::tidy()) to identify if the feature is statistically
significant.
Use broom::tidy() to identify which inputs
non-Bayesians will identify as being statistically
significant.
HINT: The common convention for statistical significance is 0.05…
Add your code chunk here.
library(broom)
tidy_modA <- tidy(modA)
tidy_modA %>%
filter(p.value < 0.05)
## # A tibble: 14 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 x02 -3.76 1.49 -2.52 0.0203
## 2 x03 7.65 2.01 3.81 0.00109
## 3 x07 -11.3 2.01 -5.63 0.0000165
## 4 x08 4.57 1.78 2.56 0.0186
## 5 x11 4.15 1.95 2.13 0.0460
## 6 x14 -5.82 2.04 -2.85 0.00989
## 7 x19 -3.26 1.52 -2.15 0.0443
## 8 x32 4.00 1.87 2.14 0.0447
## 9 x33 -3.84 1.53 -2.51 0.0207
## 10 x43 3.20 1.52 2.11 0.0480
## 11 x51 -5.35 2.16 -2.47 0.0225
## 12 x66 6.00 2.16 2.78 0.0115
## 13 x68 -3.98 1.79 -2.22 0.0381
## 14 x69 -5.41 2.16 -2.51 0.0209
You have explored the data and fit the first un-regularized non-Bayesian model. You have examined the statistically significant features, but now you must use regularized regression to identify the important features. Regularized regression is an important tool for “turning off” unimportant features in models. Regularized regression is especially useful when there are many features, as with this application. Consider for example that you are working at a startup company that produces components for medical devices. Your company makes the components with Additive Manufacturing (3D printing) and are trying to maximize the longevity of the printed part. Additive Manufacturing involves many different features that control the performance of a component. The company has tracked variables associated with the supplied material, the operation of the printer, and measured features associated with the printed objects. The company identified 75 inputs they think impact a Key Performance Indicator (KPI). It is your job to identify the most important inputs that influence the output!
This may seem like a daunting task, but that is where regularized regression can help! In lecture we introduced non-Bayesian regularized regression by stating the regression coefficients, the \(\boldsymbol{\beta}'s\), are estimated by minimizing the following loss function:
\[ SSE + \lambda \times penalty \]
The \(penalty\) formulation depends
on if we are using Ridge or Lasso regression. However, in this
assignment we will work with a slightly different loss function. We will
work with the loss function consistent with the glmnet
package:
\[ \frac{1}{2} MSE + \lambda \times penalty \]
This does not change the concepts discussed in lecture. It simply changes the magnitude of the regularization strength parameter, \(\lambda\). As an aside, this demonstrates why I prefer the Bayesian interpretation of regularization. The regularization is more interpretable in the Bayesian setting. The regularization is the prior standard deviation which controls the allowable range on the coefficients. But that is a philosophical debate for another time!
You must program the regularized loss function yourself, rather than
using glmnet to fit the regularized regression model! This
will highlight how the regularization strength is applied and how it
relates to the SSE (or MSE). This problem deals with programming the
RIDGE penalty and comparing the resulting
RIDGE estimates to the coefficient MLEs.
You will program the Ridge loss function in a manner consistent with the log-posterior functions from previous assignments. You must create an initial list of required information before you program the function. This will allow you to test your function to make sure you programmed it correctly.
Complete the code chunk below by filling in the fields to the
list of required information. The $yobs field is the
“regular” R vector for the output. The $design_matrix is
the design matrix for the problem. The $lambda field is the
regularization strength, \(\lambda\).
You must use the original wide format dfA_train
to specify the output and create the design matrix. The design matrix
must be made consistent with how we have organized design matrices in
lecture and thus must include the “intercept column of 1s”. Your design
matrix should use linear additive features for all inputs. Lastly, set
the regularization strength to 1.
check_info_A <- list(
yobs = dfA_train$y,
design_matrix = model.matrix(y ~ ., data = dfA_train),
lambda = 1
)
The function, loss_ridge(), is started for you in the
code chunk below. The loss_ridge() function consists of two
arguments. The first, betas, is the “regular” R vector of
the regression coefficients and the second, my_info, is the
list of required information. You must assume that my_info
contains fields you specified in the check_info_A list in
the previous problem.
Complete the code chunk below by calculating the mean trend, the model’s mean squared error (MSE) on the training set, the RIDGE penalty, lastly return the effective Ridge loss. The comments and variable names below state what you should calculate at each line.
To check if you programmed loss_ridge()
correctly, test the function with a guess of -1.2 for all
regression coefficients. If you programmed loss_ridge()
correctly you should get a value of 287.3114. As another test, use 0.2
for all regression coefficients. If you programmed
loss_ridge() correctly you should get a value of
111.2219.
loss_ridge <- function(betas, my_info)
{
# extract the design matrix
X <- my_info$design_matrix
# calculate linear predictor
mu <- as.vector(X %*% as.matrix(betas))
# calculate MSE
MSE <- mean((my_info$yobs - mu)^2)
# calculate ridge penalty
penalty <- sum(betas^2)
# return effective total loss
return(1/2 * MSE + my_info$lambda * penalty)
}
Test loss_ridge() with -1.2 for all regression
coefficients.
betas <- rep(-1.2, ncol(cbind(rep(1, nrow(dfA_train)), dfA_train[, -1])))
loss_ridge(betas, check_info_A)
## [1] 287.3114
Test loss_ridge() with 0.2 for all regression
coefficients.
betas <- rep(0.2, ncol(cbind(rep(1, nrow(dfA_train)), dfA_train[, -1])))
loss_ridge(betas, check_info_A)
## [1] 111.2219
You will use the optim() function to manage the
optimization and thus fitting of the Ridge regression model. In truth,
Ridge regression has a closed form analytic expression for the \(\boldsymbol{\beta}\) estimates! The formula
has a lot in common with the Bayesian posterior mean formula assuming a
known \(\sigma\)! However, we will
focus on executing and interpreting the results rather than the
mathematical derivation in this assignment.
You must define a function,
fit_regularized_regression(), to manage the optimization
for you. This function will be general and not specific to Ridge
regression. This way you can use this same function later in the
assignment to fit the Lasso regression model. The
fit_regularized_regression() function is started for you in
the code chunk below and has three arguments. The first,
lambda_use, is the assumed regularization strength
parameter value. The second, loss_func, is a function
handle for the loss function you are using to estimate the coefficients,
and the third argument, my_info, is the list of required
information.
The my_info argument is different from
check_info_A you defined previously. The
my_info argument in
fit_regularized_regression() does NOT include
$lambda. Instead, you must assign lambda_use
to the $lambda field. The purpose of
fit_regularized_regression() is to allow iterating over
many different values for \(\lambda\).
The bulk of fit_regularized_regression() is calling
optim() to execute the optimization. You must specify the
arguments correctly to the optim() function. Please note
that you are minimizing the loss and so you should
not specify fnscale in the
control argument to optim().
The last portion of fit_regularized_regression() is a
book keeping step which organizes the coefficient estimates in a manner
consistent with the broom::tidy() function.
Complete the code chunk below by specifying the initial guess
as 0 for all regression coefficients, assigning lambda_use
to the $lambda field, complete the arguments to the
optim() call, and complete the book keeping correctly. The
estimate column in the returned tibble is the
optimized estimate for the coefficients. The term column in
the returned tibble is the coefficient name. You do NOT
need to return the Hessian matrix from optim() and you
should specify the method is 'BFGS" and the
maximum number of iterations is 5001.
HINT: How can you identify the feature names associated with each coefficient?
HINT: Which of the returned elements from
optim() correspond to the optimized estimates?
fit_regularized_regression <- function(lambda_use, loss_func, my_info)
{
# create the initial guess of all zeros
start_guess <- rep(0, ncol(my_info$design_matrix))
# add the regularization strength to the list of required information
my_info$lambda <- lambda_use
fit <- optim(par = start_guess,
fn = loss_func,
method = "BFGS",
hessian = TRUE,
control = list(maxit = 5001),
my_info = my_info)
# return the regularized beta estimates
tibble::tibble(
term = colnames(my_info$design_matrix),
estimate = fit$par
) %>%
mutate(lambda = lambda_use)
}
You need to specify a new list of required information that only
contains the output and design matrix in order to test the
fit_regularized_regression() function.
Complete the first code chunk below by filling in the fields
to the list of required information. The $yobs field is the
“regular” R vector for the output. The $design_matrix is
the design matrix for the problem. You must use the original wide format
dfA_train to specify the output and create the design
matrix and you use should linear additive features for all
inputs.
Complete the second code chunk below by fitting the Ridge
regularized model with a low regularization strength value of 0.0001.
Assign the result to the check_ridge_fit_lowpenalty object.
Display the glimpse() of
check_ridge_fit_lowpenalty to the screen.
reg_info <- list(
yobs = dfA_train$y,
design_matrix = model.matrix(y ~ ., data = dfA_train)
)
Fit the Ridge model with a regularization strength of 0.0001.
Add your code chunk here.
check_ridge_fit_lowpenalty <- fit_regularized_regression(
lambda_use = 0.0001,
loss_func = loss_ridge,
my_info = reg_info
)
glimpse(check_ridge_fit_lowpenalty)
## Rows: 76
## Columns: 3
## $ term <chr> "(Intercept)", "x01", "x02", "x03", "x04", "x05", "x06", "x07…
## $ estimate <dbl> -1.31028135, 0.49421026, -3.74499404, 7.62406592, -1.46362043…
## $ lambda <dbl> 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04…
The code chunk below is completed for you. It defines a function,
viz_compare_to_mles(), which compares the regularized
coefficient estimates to the MLEs. The first argument is a dataframe
(tibble) returned from the fit_regularized_regression()
function. The second argument is an lm() model object.
### create function to visualize the coefficient estimates
viz_compare_to_mles <- function(regularized_estimates, lm_mod)
{
regularized_estimates %>%
left_join(broom::tidy(lm_mod) %>%
select(term, mle = estimate),
by = 'term') %>%
pivot_longer(c("estimate", "mle")) %>%
filter(term != '(Intercept)') %>%
ggplot(mapping = aes(y = value, x = term)) +
geom_hline(yintercept = 0, color = 'grey30',
size = 1.2, linetype = 'dashed') +
stat_summary(geom = 'linerange',
fun.max = 'max', fun.min = 'min',
mapping = aes(group = term),
color = 'black', size = 1) +
geom_point(mapping = aes(color = name,
shape = name)) +
coord_flip() +
scale_shape_discrete('', solid = FALSE) +
scale_color_brewer('', palette = 'Set1') +
labs(y = 'coefficient estimate', x = 'coefficient') +
theme_bw()
}
Use the viz_compare_to_mles() function to
compare the Ridge regularized coefficients with the low \(\lambda\) value to the coefficient
MLEs.
How do they compare?
Add your code chunk here.
check_ridge_fit_lowpenalty %>%
viz_compare_to_mles(lm_mod = modA)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
How do the coefficients compare?
When using a smaller lambda (0.0001), the ridge regularized coefficients and MLE coefficients in the figure are very similar.
Let’s now try a high regularization strength of 10000 and see the impact on the coefficient estimates.
Use the fit_regularized_regression() to fit the
Ridge regression model with a \(\lambda\) value of 10000. Assign the result
to the check_ridge_fit_highpenalty object. Use the
viz_compare_to_mles() function to compare the high
penalized Ridge coefficient estimates to the MLEs.
How do the high penalized estimates compare the MLEs?
Add your code chunks here.
check_ridge_fit_highpenalty <- fit_regularized_regression(
lambda_use = 10000,
loss_func = loss_ridge,
my_info = reg_info
)
check_ridge_fit_highpenalty %>% viz_compare_to_mles(lm_mod = modA)
How do they coefficients compare to the MLEs?
With a high regularization strength(10000), the ridge regularized coefficients are smaller than MLEs. I think it because all estimate close to 0, thus the high penalty terms have a significant impact on the coefficient estimation and they reduce the overall complexity of the model.
The first argument to the fit_regularized_regression()
function was intentionally set to lambda_use to allow
iterating over a sequence of \(\lambda\) values. This allows us to try out
dozens to hundreds of candidate \(\lambda\) values and see the regularization
effect on the coefficient estimates.
However, you must first create a sequence of candidate \(\lambda\) values to iterate over!
Define a sequence of lambda values and assign the result to
the lambda_grid variable in the first code chunk below. The
grid should be 101 evenly spaced points in the natural log scale. The
values in lambda_grid however must be in the original \(\lambda\) scale and thus NOT in the
log-scale. The lower bound should correspond to \(\lambda = 0.0001\) and the upper bound
should correspond to \(\lambda =
10000\).
The second code chunk executes the iteration for you with the
purrr::map_dfr() function. The map_dfr()
function applies the fit_regularized_regression() function
to each unique value in the lambda_grid
vector.
lambda_grid <- exp(seq(from = log(0.0001), to = log(10000), length.out = 101))
The eval flag is set to FALSE in the code chunk
arguments below. Set eval=TRUE once you have properly
defined lambda_grid.
Please note: The code chunk below may take a few minutes to complete.
ridge_path_results <- purrr::map_dfr(lambda_grid,
fit_regularized_regression,
loss_func = loss_ridge,
my_info = reg_info)
It’s now time to visualize the coefficient path for
the Ridge regression model! The code chunk below is started for you. The
Intercept is removed to be consistent with the coefficient path
visualizations created by glmnet package.
Complete the code chunk below by piping the dataframe to
ggplot(). Map the log(lambda) to the
x aesthetic and the estimate to the
y aesthetic. Include a geom_line() layer and
map the term variable to the group aesthetic
within geom_line(). Set the alpha to
0.33.
ridge_path_results %>%
filter(term != '(Intercept)') %>%
ggplot(mapping = aes(x = log(lambda), y = estimate)) +
geom_line(mapping = aes(group = term),alpha = 0.33)
Describe the behavior of the coefficient paths as the regularization strength increases in the plot shown in 2h).
What do you think?
According to the results in the figure, as the lambda increases, the coefficient paths shrink towards zero. This is because as the strength of the model regularization increases, the model becomes simpler and simpler, which eventually makes the model lose a lot of important information.
As discussed in lecture the Ridge penalty is analogous to applying independent Gaussian priors in Bayesian linear models! However, the regularization penalty can take other forms besides the Ridge penalty. The LASSO penalty behaves differently than Ridge. This problem is focused on applying the Lasso penalty and comparing it to the behavior of the Ridge penalty.
You will begin by defining a loss function similar to the
loss_ridge() function you defined earlier. You will then
use this new function, loss_lasso(), to fit the Lasso
regression model.
The function, loss_lasso(), is started for you in the
code chunk below. The loss_lasso() function has the same
structure as the loss_ridge() function and thus has 2
arguments. The first, betas, is the “regular” R vector of
the regression coefficients and the second, my_info, is the
list of required information. You must assume that my_info
contains the same fields as the my_info list used within
loss_ridge().
Complete the code chunk below by calculating the mean trend, the model’s mean squared error (MSE) on the training set, the LASSO penalty, lastly return the effective Lasso loss. The comments and variable names below state what you should calculate at each line.
To check if you programmed loss_lasso()
correctly, test the function with a guess of -1.2 for all
regression coefficients and use the check_info_A list
defined earlier. If you programmed loss_lasso() correctly
you should get a value of 269.0714. As another test, use 0.2 for all
regression coefficients and use the check_info_A list
defined earlier. If you programmed loss_lasso() correctly
you should get a value of 123.3819.
loss_lasso <- function(betas, my_info)
{
# extract the design matrix
X <- my_info$design_matrix
# calculate linear predictor
mu <- as.vector(X %*% as.matrix(betas))
# calculate MSE
MSE <- mean((my_info$yobs - mu)^2)
# calculate LASSO penalty
penalty <- sum(abs(betas))
# return effective total loss
return(1/2 * MSE + my_info$lambda * penalty)
}
Test loss_lasso() with -1.2 for all regression
coefficients.
loss_lasso(rep(-1.2, length(check_info_A$design_matrix[1, ])), check_info_A)
## [1] 269.0714
Test loss_lasso() with 0.2 for all regression
coefficients.
loss_lasso(rep(0.2, length(check_info_A$design_matrix[1, ])), check_info_A)
## [1] 123.3819
You now have everything ready to fit the Lasso regression model! The
fit_regularized_regression() function was defined to be
general. You should use fit_regularized_regression() to fit
the Lasso model by assigning loss_lasso to the
loss_func argument. You must use the reg_info
list as the my_info argument. The user supplied
lambda_use value will then be used as the regularization
strength applied to the Lasso penalty.
Fit the Lasso regression model with a low penalty of 0.0001
and assign the result to the check_lasso_fit_lowpenalty
object.
After fitting, use the viz_compare_to_mles()
function to compare the Lasso estimates to the MLEs.
How do the Lasso estimates compare to the MLEs?
Add your code chunks here.
check_lasso_fit_lowpenalty <- fit_regularized_regression(
lambda_use = 0.0001,
loss_func = loss_lasso,
my_info = reg_info
)
viz_compare_to_mles(check_lasso_fit_lowpenalty, lm_mod = modA)
How do they compare?
Compared to the results of 2e, the results of 3b look much more similar. However, some of the Lasso estimates are smaller than MLEs, indicating the Lasso penalty’s effect on promoting sparsity in the model.
Next, let’s examine how a strong Lasso penalty impacts the coefficients.
Fit the Lasso regression model again but this time with a
high penalty of 10000 and assign the result to the
check_lasso_fit_highpenalty object.
After fitting, use the viz_compare_to_mles()
function to compare the Lasso estimates to the MLEs.
How do the Lasso estimates compare to the MLEs?
Add your code chunks here.
check_lasso_fit_highpenalty <- fit_regularized_regression(
lambda_use = 10000,
loss_func = loss_lasso,
my_info = reg_info
)
viz_compare_to_mles(check_lasso_fit_highpenalty, lm_mod = modA)
The difference is not significant compared to the 2f results. the Lasso estimates are still almost all close to 0.
Previously, you defined a sequence of \(\lambda\) values and assigned those values
to the lambda_grid object. This sequence serves the role as
a candidate search grid. We do not know the best \(\lambda\) to use so we try out many values!
The code chunk below is completed for you. The eval flag is
set to FALSE and so you must set eval=TRUE in the code
chunk options. The Lasso regression model is fit for each \(\lambda\) value in the search grid and the
coefficient estimates are returned within a dataframe (tibble). The
lasso_path_results dataframe includes the
lambda value, just as the ridge_path_results
object included the lambda value previously.
Please note: The code chunk below may take a few minutes to complete.
lasso_path_results <- purrr::map_dfr(lambda_grid,
fit_regularized_regression,
loss_func = loss_lasso,
my_info = reg_info)
You will use the lasso_path_results to visualize the
Lasso coefficient path. As with the Ridge coefficient
path from earlier in the assignment, the Intercept is removed for you to
be consistent with the glmnet path visualizations.
Complete the code chunk below by piping the dataframe to
ggplot(). Map the log(lambda) to the
x aesthetic and the estimate to the
y aesthetic. Include a geom_line() layer and
map the term variable to the group aesthetic
within geom_line(). Set the alpha to
0.33.
How does the figure below compare to the figure created in Problem 2h)? How are the Lasso paths different from the Ridge paths?
lasso_path_results %>%
filter(term != '(Intercept)') %>%
ggplot(mapping = aes(x = log(lambda), y = estimate, group = term)) +
geom_line(alpha = 0.33)
What do you think?
Compared with the 2h image, Lasso’s paths change more abruptly and the 2h Ridge paths are smoother. I think Lasso forces certain coefficients to be 0, but Ridge regularization simply gets them as close to 0 as possible, so the Ridge regularization process is relatively smoother.
Now that you have studied the behavior of the Ridge and Lasso penalties in greater depth, it’s time to tune the regularization strength. You will tune the regularization strength just for the Lasso model. The steps are the same for the Ridge model, but we will focus on Lasso in this question.
The code chunk below reads in another data set for you. This data set is a random test split from the same data the training set was created from. You will thus use this data set as the hold-out test set to assess or evaluate the Lasso model for each regularization strength in the search grid.
dfA_test <- readr::read_csv('hw09_probA_test.csv', col_names = TRUE)
## Rows: 24 Columns: 76
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (76): x01, x02, x03, x04, x05, x06, x07, x08, x09, x10, x11, x12, x13, x...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
A glimpse is provided below to show the variable names are the same
as those in dfA_train.
dfA_test %>% glimpse()
## Rows: 24
## Columns: 76
## $ x01 <dbl> 1.14313808, 0.01816419, 0.49591261, -1.49232633, -0.62059649, -0.0…
## $ x02 <dbl> -0.6464637, -0.4662545, -0.7916580, -0.7770189, 1.7378213, -0.4956…
## $ x03 <dbl> -1.89562312, -1.83545340, 0.58783552, -0.56476710, -0.15024320, 0.…
## $ x04 <dbl> -1.815775806, 0.281280533, -0.106889840, 1.066147596, 0.484055779,…
## $ x05 <dbl> -0.7128627, 1.4877917, -1.3365942, 2.0733856, -1.3475529, 0.331148…
## $ x06 <dbl> 1.49499607, -0.58088658, -2.78972454, -0.57166274, 0.03457907, 1.7…
## $ x07 <dbl> -1.17828630, -0.07111160, -0.64334020, -0.24136133, 0.55998293, 0.…
## $ x08 <dbl> -0.319216142, -0.314247113, -0.006775336, 0.472946659, -0.99712812…
## $ x09 <dbl> -0.92278608, -0.78961832, 0.16124184, -0.37300982, 0.31941510, -0.…
## $ x10 <dbl> 0.43104894, 0.06647602, -0.38361750, -0.98903565, -0.84237893, -0.…
## $ x11 <dbl> -0.75607015, -0.83118449, -1.30590588, 0.65576000, -0.32057164, 0.…
## $ x12 <dbl> -1.61445673, 2.23094618, 0.00335797, 0.63447707, -0.83379618, -2.1…
## $ x13 <dbl> -1.04732333, 0.94678115, -1.17904809, 0.01401656, 1.10230968, 2.84…
## $ x14 <dbl> 1.24779778, 2.21320896, -0.35418771, -2.29432424, 0.32813203, -0.3…
## $ x15 <dbl> 1.152395009, 0.871427100, -0.809043870, 1.068301785, -0.319201329,…
## $ x16 <dbl> -0.68759127, 0.22346479, 0.27785233, 0.63106951, -1.20893271, -0.9…
## $ x17 <dbl> -1.32498321, 0.99321252, 0.54829226, -1.98481617, -0.68417370, 2.1…
## $ x18 <dbl> 0.57914293, 1.04769622, -1.77207000, 0.82656793, 0.06793236, -0.26…
## $ x19 <dbl> -2.7324805, 0.8359603, 0.2268546, 0.1402093, -0.7464301, -0.554660…
## $ x20 <dbl> -0.13225186, -0.11301496, 1.45685079, 1.08894467, 0.52618785, 0.48…
## $ x21 <dbl> -0.05511845, 0.67435934, -1.23584780, -0.52775922, -2.39672927, -0…
## $ x22 <dbl> 0.376887793, 1.143383468, 0.347587585, 0.366396727, 1.204298906, 1…
## $ x23 <dbl> -0.02315765, -0.17996268, -0.06702020, -0.96745663, 1.05621951, 0.…
## $ x24 <dbl> -0.79364619, -0.75141178, -1.44564412, -0.26952336, -0.07881057, -…
## $ x25 <dbl> -1.1762732, 0.2858181, -1.9007820, 1.7190158, -2.1020462, 0.521933…
## $ x26 <dbl> 0.03440645, -0.16521310, 1.37251771, 0.15838025, 0.36806501, 0.481…
## $ x27 <dbl> 0.3555787, 0.7234963, -0.0119092, 0.5693598, 0.6075264, 0.9278386,…
## $ x28 <dbl> 0.46563934, 0.63759021, 0.69499926, -1.22618311, 0.45442219, 0.330…
## $ x29 <dbl> -0.30017643, -1.03137258, -0.76110963, 1.01467089, 1.30270574, -0.…
## $ x30 <dbl> 0.45063846, -0.27610195, 0.02663231, 0.23689472, -0.49330443, 0.29…
## $ x31 <dbl> 0.37249149, 0.99151848, -1.45214015, -1.20993730, 0.95817968, 0.47…
## $ x32 <dbl> 0.160626608, 0.298804786, 1.392771898, 2.397480316, -0.350553982, …
## $ x33 <dbl> 0.6959056, -0.5876913, 1.0865048, -0.2956193, 1.1790686, 0.2194538…
## $ x34 <dbl> -1.286318824, -0.573059103, 0.709131820, 0.603993658, -0.009135101…
## $ x35 <dbl> -0.029824439, 0.120974022, -0.048754275, 1.998250597, -0.285683304…
## $ x36 <dbl> 1.74619464, -1.15938972, 0.52561374, -1.41721149, 0.45313713, -0.5…
## $ x37 <dbl> -1.6102714, 1.2100254, -0.8188211, 0.4097319, 0.3564670, 1.4065122…
## $ x38 <dbl> -0.52626336, -0.13884224, -0.97867868, 1.97088576, 0.99121364, 0.9…
## $ x39 <dbl> 0.47424567, 0.16380496, 0.09808709, -0.15455438, 0.38395056, 0.215…
## $ x40 <dbl> 1.04905686, -1.12755696, 0.81950481, -0.62597926, 0.18891435, 0.27…
## $ x41 <dbl> 0.9802753, 1.0727297, -0.1034688, -0.7738019, 1.4521328, 0.5824250…
## $ x42 <dbl> 0.8026953, 0.3912334, 0.2311717, 0.4416862, 0.6058927, 1.3619968, …
## $ x43 <dbl> 1.497014705, 0.199461083, 0.877037014, 1.275111836, -1.463546089, …
## $ x44 <dbl> 0.88361734, -0.43124514, 0.76909353, 0.34181375, 0.84916573, 0.527…
## $ x45 <dbl> -0.08048870, -0.87731052, -1.23497116, 1.45209109, 0.17710336, -0.…
## $ x46 <dbl> -0.22229459, 1.39743346, 1.46384916, 0.03334550, 0.73187037, 0.790…
## $ x47 <dbl> -1.7476152825, -0.2879971901, 1.7775635629, -0.8658681201, -0.1762…
## $ x48 <dbl> 1.07512535, -1.42892804, 1.14830168, -1.15034090, 1.14859354, -0.8…
## $ x49 <dbl> 0.11297505, -0.33298760, 2.62739250, -0.06144636, -0.62081207, -0.…
## $ x50 <dbl> -1.760714e+00, 1.546895e-01, -1.336308e+00, -5.341918e-01, -1.3781…
## $ x51 <dbl> 0.355802760, 1.261605721, -1.007393498, -0.229887771, -0.494385689…
## $ x52 <dbl> 1.12843542, 0.18742887, -0.15268669, -1.69652901, 1.00424500, -0.3…
## $ x53 <dbl> 0.2612449, 0.4780664, -0.2110005, -0.5120048, -1.3017326, 0.713045…
## $ x54 <dbl> -0.38075761, 0.84135339, 0.19037086, -0.80044132, -0.02384702, 0.1…
## $ x55 <dbl> 2.491695857, 0.463015446, 0.592801784, -0.513619781, 1.192908697, …
## $ x56 <dbl> 0.6072242, -0.6014201, -0.8298660, 1.6731066, -1.1881461, 0.800793…
## $ x57 <dbl> 0.562299802, 1.513023999, -0.919529890, -0.637432106, -1.079789149…
## $ x58 <dbl> -1.90164605, 1.32513196, -1.32907538, -0.06284810, 0.57858234, -0.…
## $ x59 <dbl> -0.678017763, 0.671531331, 0.373627202, -1.360163880, -0.798847302…
## $ x60 <dbl> -1.01817726, 0.05251363, -0.29394473, 0.89162065, 0.84220139, -0.7…
## $ x61 <dbl> -0.309767250, -1.295571666, 0.476181835, 0.987615256, -0.421788553…
## $ x62 <dbl> 0.94130874, 1.49237811, 1.48027176, 1.49906080, -1.18024281, 0.061…
## $ x63 <dbl> 1.117138820, 1.783084869, -1.758350099, 0.321090650, 0.156595164, …
## $ x64 <dbl> 0.01336110, -2.61963800, 0.23684126, -1.75412865, 1.56489203, -1.1…
## $ x65 <dbl> 1.16813558, 1.27909451, -0.23801311, -1.55946227, 0.47303683, 0.84…
## $ x66 <dbl> -0.68179048, -0.62455405, -1.04955883, 0.08481691, 0.13387617, 2.8…
## $ x67 <dbl> -0.107278267, 0.014839010, 0.584834981, -2.277171121, 2.358414125,…
## $ x68 <dbl> -0.34835071, -0.01205553, 0.15787720, -1.72791685, -0.23807589, -0…
## $ x69 <dbl> 0.891925590, 1.195514481, 0.339961334, -0.246669111, -1.672261892,…
## $ x70 <dbl> 0.81289142, 0.74375772, -0.62796234, -2.20482715, -1.21972263, 1.4…
## $ x71 <dbl> 2.27572777, 1.19452221, -0.40839006, -0.60256595, -0.83067145, 1.3…
## $ x72 <dbl> -0.69807238, 0.31327920, -0.15541045, -2.06316223, -1.24371309, 0.…
## $ x73 <dbl> 0.17105096, -0.24375675, -1.56184416, -0.09647939, 0.38517769, 0.8…
## $ x74 <dbl> 0.40569893, -0.26919348, -1.22765740, -0.78351810, -0.51345731, -1…
## $ x75 <dbl> -0.15896868, -0.73370485, -0.40814114, 0.41869379, 0.52360174, 0.6…
## $ y <dbl> -19.9840444, -21.5075154, 11.6157884, -4.8924499, -17.1137287, 11.…
You will execute fitting the Lasso model on the training set and then
testing the Lasso model on the hold-out test set. You will “score” the
model by calculating the hold-out test MSE for each \(\lambda\) value in the search grid. You
already fit the Lasso model on the training set, dfA_train,
for each value in lambda_grid. However, you will create a
function, train_and_test_regularized(), which executes both
the training and testing actions. Thus, your function,
train_and_test_regularized(), will do everything required
to evaluate the model performance. Your function is therefore analogous
to the cv.glmnet() function, the
caret::train() function, and the training and testing
functions within tidymodels!
The train_and_test_regularized() function is started for
you in the code chunk below. It consists of 4 arguments. The first,
lambda_use, is the user specified regularization strength.
The second, train_data, is the training data set. The
third, test_data, is the hold-out test data set. The last
argument, loss_func, is the loss function used to fit the
model. Notice that you do not provide the lists of required information
to train_and_test_regularized(). Instead, the data are
provided and train_and_test_regularized() defines the list
of required information!
It is important to note that the we need to
standardize the training data before fitting regularized models. The
testing data must then be standardized based on the
training data. The dfA_test data provided to you has
already been appropriately pre-processed and so you do not need to worry
about that here. This lets you focus on training and assessing the model
performance.
Complete the code chunk below to define the
train_and_test_regularized() function. The comments and
variable names state the actions you should perform within each line of
code.
Remember that we are working with linear additive features for all inputs.
HINT: Remember that you defined a function to allow fitting the model for a given \(\lambda\) value previously…
train_and_test_regularized <- function(lambda_use, train_data, test_data, loss_func)
{
# make design matrix on TRAINING set
Xtrain <- model.matrix(y ~ ., data = train_data)
# make list of required information for the TRAINING set
info_train <- list(
yobs = train_data$y,
design_matrix = Xtrain
)
# train the model
train_results <- fit_regularized_regression(lambda_use, loss_func, info_train)
# extract the training set coefficient estimates (completed for you)
beta_estimates <- train_results %>% pull(estimate)
# make the design matrix on the TEST set
Xtest <- model.matrix(y ~ ., data = test_data)
# predict the trend on the TEST data
mu_test <- as.vector(Xtest %*% as.matrix(beta_estimates))
# calculate the MSE on the TEST set
MSE_test <- mean((test_data$y - mu_test)^2)
# book keeping (completed for you)
list(lambda = lambda_use,
MSE_test = MSE_test)
}
The code chunk below is completed for you. It applies the
train_and_test_regularized() function to each \(\lambda\) value in the
lambda_grid search grid. The loss_lasso
function is assigned to the loss_func argument and thus the
Lasso model is trained and tested for each regularization strength
candidate value. The eval flag is set to FALSE and so you
must set eval=TRUE in the code chunk options.
Please note: The code chunk below may take a few minutes to complete.
train_test_lasso_results <- purrr::map_dfr(lambda_grid,
train_and_test_regularized,
train_data = dfA_train,
test_data = dfA_test,
loss_func = loss_lasso)
A glimpse of the train/test assessment results is shown to the screen
below. The result is a dataframe (tibble) with two columns. The
lambda column is the regularization strength and
MSE_test is the MSE on the test set.
train_test_lasso_results %>% glimpse()
## Rows: 101
## Columns: 2
## $ lambda <dbl> 0.0001000000, 0.0001202264, 0.0001445440, 0.0001737801, 0.000…
## $ MSE_test <dbl> 788.0273, 787.6335, 787.1600, 786.6299, 786.0216, 785.3100, 7…
You must visualize the test set performance by plotting the hold-out test MSE with respect to the log of the regularization strength.
Create a line plot in ggplot2 to visualize the
hold-out test MSE vs the log of the regularization
strength.
Which \(\lambda\) value yields the best test set performance?
Add your code chunks here.
train_test_lasso_results %>% ggplot(aes(x = log(lambda), y = MSE_test)) +
geom_line() +
geom_point()
Which \(\lambda\) is the best?
best_lasso <- train_test_lasso_results %>%
filter(MSE_test == min(MSE_test))
best_lasso
## # A tibble: 1 × 2
## lambda MSE_test
## <dbl> <dbl>
## 1 2.09 133.
As the lambda keeps increasing, the overall trend of the image first decreases and then increases. When lambda is 2.089296, the MSE reaches the minimum value, and this is when we consider the best performance of the model.
You must now compare the Lasso estimates associated with the best \(\lambda\) value to the MLEs.
HINT: Remember that you already calculated the Lasso
estimates for all \(\lambda\) values
within the lasso_path_results object. You just need to
filter that object for the appropriate lambda value to
identify the best Lasso estimates.
Add your code chunks here.
best_lasso_estimates <- lasso_path_results[lasso_path_results$lambda == best_lasso$lambda,]
viz_compare_to_mles(best_lasso_estimates,modA)
You just used a train/test split to validate and tune a machine learning model! However, single train/test splits do not allow us to estimate the reliability of the performance. We are not able to state how confident we are in the finding the best \(\lambda\) value in this case. Resampling methods, such as cross-validation, provide more robust and stable results compared to single train/test splits.
However, you will not execute the Resampling manually. You will use
the cv.glmnet() function from the glmnet
library to manage cross-validation and “scoring” for you.
The glmnet library is loaded for you in the code chunk
below.
library(glmnet)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
## Loaded glmnet 4.1-7
The dfA_train and dfA_test splits were
randomly created by splitting a larger data set with an 80/20 train/test
split. The code chunk below binds the rows together, recreating the
original data set. This allows cv.glmnet() to manage
splitting the original 120 observations rather than the smaller randomly
created training set.
dfA <- dfA_train %>% bind_rows(dfA_test)
Let’s start out by fitting the Lasso model directly with
glmnet before executing the cross-validation. This way we
can compare our previous Lasso path to the glmnet path!
You need to create the design matrix consistent with the
glmnet requirements before fitting the glmnet
model. You must therefore create a new design matrix for the linear
additive features for all inputs but remove the
intercept column from the design matrix.
Complete the code chunk below by defining the
glmnet required design matrix, extracting the response
vector, and then fitting the Lasso model with
glmnet.
Assign the lambda_grid object you created
previously to the lambda argument within
glmnet(). You are therefore using a custom
lambda grid rather than the default set of
lambda values.
Xenet <- model.matrix(y ~ . - 1, data = dfA)
yenet <- dfA$y
lasso_glmnet <- glmnet(x = Xenet, y = yenet, alpha = 1, lambda = lambda_grid)
Use the glmnet default plot method to visualize
the coefficient path for the glmnet trained Lasso model.
You must assign the xvar argument in the plot function call
to 'lambda' to show the path consistent with your previous
figures.
How does the glmnet created coefficient path
compare to Lasso coefficient path you created in Problem
3d)?
Add your code chunks here.
plot(lasso_glmnet, xvar = 'lambda', label = TRUE)
How do they compare?
The image obtained is almost identical to the result in 3d
Now it’s time to tune the Lasso model with cross-validation. You will
specifically use 5 fold cross-validation and you must use the same \(\lambda\) search grid defined previously
within lambda_grid.
Use the cv.glmnet() function to tune the Lasso
model with 5-fold cross-validation and the defined search grid. Assign
the lambda_grid object to the lambda argument
within cv.glmnet(). Assign the result to the
lasso_cv object.
Plot the cross-validation results to the screen.
How many non-zero features does the best Lasso model have? How many non-zero features does the Lasso model recommended by the 1-standard error rule have?
NOTE: The random seed is set for you below to force the splits to be the same every time you run the code chunk.
set.seed(71231)
lasso_cv <- cv.glmnet(Xenet, yenet, alpha = 1, nfolds = 5, lambda = lambda_grid)
plot(lasso_cv)
best_lambda <- lasso_cv$lambda.min
model_best_lambda <- glmnet(Xenet, yenet, alpha = 1, lambda = best_lambda)
coef_best_lambda <- coef(model_best_lambda)
nonzero_best_lambda <- sum(coef_best_lambda != 0) - 1
lambda_1se <- lasso_cv$lambda.1se
model_1se <- glmnet(Xenet, yenet, alpha = 1, lambda = lambda_1se)
coef_1se <- coef(model_1se)
nonzero_1se <- sum(coef_1se != 0) - 1
nonzero_best_lambda
## [1] 24
nonzero_1se
## [1] 4
The best Lasso model has 24 non-zero features and the Lasso model has 4 non-zero features.
Which inputs matter based on the tuned Lasso model?
Add your code chunks here and what do you think?
coef(lasso_cv) %>% as.matrix() %>% as.data.frame() %>% filter(s1 != 0)
## s1
## (Intercept) 0.1761156
## x03 3.8735176
## x07 -4.1879725
## x21 1.4492613
## x33 -0.9854943
According to the result, I think x03,x07,x21 and x33 are important inputs.
Correlated features cause problems for many models. One approach to try and identify if the features correlation matters is through tuning the elastic net model. The elastic net blends Ridge and Lasso penalties and involves two tuning parameters. The regularization strength is the same parameter you worked with in the previous problems. The mixing fraction is the weight applied to blend the Ridge and Lasso penalties. Tuning the mixing fraction will help us identify if the feature correlation impacts the model behavior.
The code chunk below reads in a data set that will let you practice working with correlated features.
dfB <- readr::read_csv('hw09_probB.csv', col_names = TRUE)
## Rows: 110 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): x1, x2, x3, x4, x5, x6, x7, x8, x9, y
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The glimpse provided below shows there are 9 continuous inputs and 1 continuous response.
dfB %>% glimpse()
## Rows: 110
## Columns: 10
## $ x1 <dbl> 3.09869212, -0.27058881, 0.18559238, 0.16634118, -1.23293151, -1.36…
## $ x2 <dbl> 2.94214328, -1.17373978, -0.01076317, -0.65687092, -1.14843444, 0.1…
## $ x3 <dbl> 3.28268099, 0.15014082, -0.27188083, -0.75522974, -0.90504324, -0.0…
## $ x4 <dbl> 2.37049510, -0.55569220, -0.06411316, -1.20347199, -0.67747417, -0.…
## $ x5 <dbl> 2.95419609, -0.69823484, 0.23546653, -0.97386688, -1.44499383, -0.1…
## $ x6 <dbl> 3.5092130, -0.8562281, 0.6292396, -0.4149860, -1.5292260, 0.1067256…
## $ x7 <dbl> 1.58927120, -1.28057044, -0.21849387, -1.41114837, -0.11611693, 1.1…
## $ x8 <dbl> 2.9937227, -1.6100189, 0.5300147, -0.5169738, -0.3193423, -0.208050…
## $ x9 <dbl> 2.19334318, -0.58670454, 0.70500888, -0.25869602, -0.05056980, 0.13…
## $ y <dbl> -20.84643348, -5.89326160, 0.08635077, -5.83994552, 5.77691663, 1.4…
The reshaped long-format data set is created for you in the code
chunk below. The response y is not stacked and thus the
long-format includes all inputs, x1 through x9
stacked together.
lfB <- dfB %>%
tibble::rowid_to_column() %>%
pivot_longer(starts_with('x'))
lfB %>% glimpse()
## Rows: 990
## Columns: 4
## $ rowid <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3…
## $ y <dbl> -20.84643348, -20.84643348, -20.84643348, -20.84643348, -20.8464…
## $ name <chr> "x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x1", "x2"…
## $ value <dbl> 3.09869212, 2.94214328, 3.28268099, 2.37049510, 2.95419609, 3.50…
Create a scatter plot to show the relationship between the output and each input. Use the reshaped long-format data set to create facets for each input.
Add your code chunks here.
lfB %>% ggplot(aes(x = value, y = y)) +
geom_point(alpha = 0.5) +
facet_wrap(~name, scales = "free")
You visualized the output to input relationships, but you must also
examine the relationship between the inputs! The cor()
function can be used to calculate the correlation matrix between all
columns in a data frame.
Correlation matrices are created from wide-format data and so you
will use the original wide-format data, dfB, for this
question instead of the long-format data, lfB.
Pipe dfB into select() and select
all columns except the response y. Pipe the result to the
cor() function and display the correlation matrix to
screen.
Add your code chunks here.
cor_matrix <- dfB %>% select(-y) %>% cor()
cor_matrix
## x1 x2 x3 x4 x5 x6 x7
## x1 1.0000000 0.9100358 0.8325641 0.8625994 0.8706074 0.9176084 0.7733840
## x2 0.9100358 1.0000000 0.8563563 0.8754078 0.8964453 0.9164890 0.7798010
## x3 0.8325641 0.8563563 1.0000000 0.8207980 0.8225810 0.8381175 0.6932513
## x4 0.8625994 0.8754078 0.8207980 1.0000000 0.8114702 0.8668589 0.7536997
## x5 0.8706074 0.8964453 0.8225810 0.8114702 1.0000000 0.8677033 0.7503743
## x6 0.9176084 0.9164890 0.8381175 0.8668589 0.8677033 1.0000000 0.7545994
## x7 0.7733840 0.7798010 0.6932513 0.7536997 0.7503743 0.7545994 1.0000000
## x8 0.7661651 0.7683296 0.6959127 0.7647050 0.7307950 0.7661827 0.6671351
## x9 0.7623510 0.7709967 0.7050866 0.7632158 0.7416818 0.7680993 0.6617914
## x8 x9
## x1 0.7661651 0.7623510
## x2 0.7683296 0.7709967
## x3 0.6959127 0.7050866
## x4 0.7647050 0.7632158
## x5 0.7307950 0.7416818
## x6 0.7661827 0.7680993
## x7 0.6671351 0.6617914
## x8 1.0000000 0.6664688
## x9 0.6664688 1.0000000
Rather than displaying the numbers of the correlation matrix, let’s
create a correlation plot to visualize the correlation coefficient
between each pair of inputs. The corrplot package provides
the corrplot() function to easily create clean and simple
correlation plots. You do not have to load the corrplot
package, instead you will call the corrplot() function from
corrplot using the :: operator. Thus, you will
call the function as corrplot::corrplot().
The first argument to corrplot::corrplot() is a
correlation matrix. You must therefore calculate the correlation matrix
associated with a data frame and pass that matrix into
corrplot::corrplot().
You will create two correlation plots. For the first, use the
default input arguments to corrplot::corrplot(). For the
second, assign the type argument to 'upper' to
visualize the correlation plot as an upper-triangular
matrix.
Based on your visualizations, are the inputs correlated?
Add your code chunks here.
corrplot::corrplot(cor_matrix, type = "upper")
Based on the correlation plots, it is clear that the inputs are indeed correlated. In addition, according to the color, we can know that they are highly correlated.
In lecture we discussed that we can easily create more features than inputs by considering interactions between the inputs! Thus, when exploring if feature correlation could impact model results, we should not simply examine the correlation between the inputs. We should also examine the correlation between the features derived from the inputs!
For this problem, let’s work with all pair wise interactions between the inputs. You should therefore create a model that has all main effects and all pair wise products between the inputs.
Define the design matrix that includes all main effects and
pair wise produces between the inputs associated with dfB.
Assign the result to the XBpairs objects.
We are focused on exploring the relationship between the features and so you must remove the intercept column of ones from this design matrix.
How many columns are in the design matrix?
Add your code chunks here.
XBpairs <- model.matrix(y ~ .*.-1, data = dfB)
dim(XBpairs)
## [1] 110 45
There are 45 columns in the matrix
Create the correlation plot between all features in the
XBpairs design matrix. Assign the type
argument to 'upper' and set the method
argument to 'square' in corrplot::corrplot()
to visualize the correlation coefficients as colored square “tiles”
within an upper triangular matrix.
HINT: Remember that the correlation matrix must be
calculated first before calling corrplot::corrplot()!
Add your code chunks here.
corrplot::corrplot(cor(XBpairs), type = "upper", method = 'square')
The corrplot::corrplot() function can reorder
correlation plots to group correlated variables in clusters. This can
help us “see” correlation structure easier compared to keeping the
variables in their original order.
Create the correlation plot between all features in the
XBpairs design matrix again. However, you should set the
order argument to 'hclust' and the
hclust.method argument to 'ward.D2'. Continue
to set method to 'square' and
type to 'upper'.
HINT: Remember that the correlation matrix must be
calculated first before calling corrplot::corrplot()!
Add your code chunks here.
corrplot::corrplot(cor(XBpairs), type = 'upper', method = 'square',
order = 'hclust', hclust.method = 'ward.D2')
You will now use caret to tune the mixing fraction and
regularization strength of the elastic net model. The caret
library is loaded for you in the code chunk below.
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
You must specify the resampling scheme that caret will use to train,
assess, and tune a model. You worked with caret in earlier
assignments and there are several examples provided in lecture if you
need additional help.
Specify the resampling scheme to be 5 fold with 5 repeats.
Assign the result of the trainControl() function to the
my_ctrl object. Specify the primary performance metric to
be 'RMSE' and assign that to the my_metric
object.
Add your code chunks here.
my_ctrl <- trainControl(method = "repeatedcv",
number = 5,
repeats = 5)
my_metric <- "RMSE"
You must train, assess, and tune an elastic model using the default
caret tuning grid. In the caret::train()
function you must use the formula interface to specify a model with all
pair wise interactions to predict the response y. Assign
the method argument to 'glmnet' and set the metric argument
to my_metric. You must also instruct
caret to standardize the features by setting the
preProcess argument equal to
c('center', 'scale'). Assign the trControl
argument to the my_ctrl object.
Important: The caret::train() function
works with the formula interface. Thus, even though you are using
glmnet to fit the model, caret does
NOT require you to organize the design matrix as
required by glmnet! Thus, you do NOT need
to remove the intercept when defining the formula to
caret::train()!
Train, assess, and tune the glmnet elastic net
model consisting of main effects and all pair wise interactions with the
defined resampling scheme. Assign the result to the
enet_default object and display the result to the
screen.
Which tuning parameter combinations are considered to be the best?
Is the best set of tuning parameters more consistent with Lasso or Ridge regression?
Does the feature correlation seem to be impacting model behavior in this application?
The random seed is set for you for reproducibility.
set.seed(1234)
enet_default <- train(y ~ (x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9)^2,
data = dfB,
method = "glmnet",
metric = my_metric,
preProcess = c("center", "scale"),
trControl = my_ctrl)
enet_default
## glmnet
##
## 110 samples
## 9 predictor
##
## Pre-processing: centered (45), scaled (45)
## Resampling: Cross-Validated (5 fold, repeated 5 times)
## Summary of sample sizes: 87, 88, 87, 89, 89, 88, ...
## Resampling results across tuning parameters:
##
## alpha lambda RMSE Rsquared MAE
## 0.10 0.01798082 4.921227 0.8038755 3.879746
## 0.10 0.17980820 4.364162 0.8387418 3.483446
## 0.10 1.79808197 4.977807 0.7785130 4.067109
## 0.55 0.01798082 4.752239 0.8156402 3.750969
## 0.55 0.17980820 4.207071 0.8493438 3.406831
## 0.55 1.79808197 5.370753 0.7453470 4.331699
## 1.00 0.01798082 4.677062 0.8195274 3.690341
## 1.00 0.17980820 4.260269 0.8434932 3.452308
## 1.00 1.79808197 5.537609 0.7496795 4.420238
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0.55 and lambda = 0.1798082.
When alpha = 0.55 and lambda = 0.1798082, the model can be considered as the best. This result is different with Lasso or Ridge regression.
Display the coefficients to the screen for the tuned elastic net model. Which coefficients are non-zero?
Add your code chunks here.
best_tuned_model <- enet_default$finalModel
coefficients <- coef(best_tuned_model, s = enet_default$bestTune$lambda) %>%
as.matrix() %>%
as.data.frame() %>%
filter(s1 != 0)
coefficients
## s1
## (Intercept) -3.9787738
## x1 0.4709670
## x4 -1.0551157
## x7 -1.2677801
## x8 1.7497374
## x9 -0.6197409
## x1:x8 -1.4007389
## x2:x6 0.9468922
## x2:x8 -0.6740257
## x2:x9 1.1333283
## x3:x8 -0.6640331
## x4:x7 -1.8214194
## x5:x7 1.3474542
## x6:x8 -0.2226738
## x6:x9 1.6007788
## x7:x8 -5.2821782
## x7:x9 3.6603517
## x8:x9 -7.8244994
There are 18 coefficients are non-zero, which are displayed in the table.
caret provides several useful helper functions for
ranking the features based on their influence on the response. This is
known as ranking the variable importances and the varImp()
function will extract the variable importances from a model for you.
Wrapping the varImp() result with plot() will
create a figure showing the variable importances. By default, the
displayed importance values are in a relative scale with 100
corresponding to the most important variable.
Plot the variable importances for the caret
tuned elastic net model. Are the rankings consistent with the magnitude
of the coefficients you printed to the screen in Problem
5ei)?
Add your code chunks here.
variable_importances <- varImp(enet_default)
plot(variable_importances)
Yes. They are consistence.
We have focused heavily on working with continuous inputs this
semester. However, linear models may also use categorical
inputs as features. This problem introduces working with
categorical inputs by fitting non-Bayesian linear models with
lm().
The data you will work with is loaded for you in the code chunk below.
dfC <- readr::read_csv('hw09_probC.csv', col_names = TRUE)
## Rows: 75 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): v
## dbl (2): x, y
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The glimpse shows there are two inputs, x and
v, and one continuous response y. The
v input is non-numeric.
dfC %>% glimpse()
## Rows: 75
## Columns: 3
## $ x <dbl> -1.65522340, -0.40027559, -0.80346451, 0.76686259, 0.93237497, 0.010…
## $ v <chr> "A", "A", "A", "B", "D", "D", "D", "A", "C", "D", "C", "A", "B", "A"…
## $ y <dbl> -1.0995699, -2.0743885, -2.2994641, 1.5491779, -0.1327855, -0.381488…
Create a bar chart in ggplot2 to show the counts
associated with each unique value of the categorical variable
v.
Add your code chunks here.
dfC %>% ggplot(aes(x = v)) +
geom_bar()
Next, let’s focus on the relationship between the continuous output
y and the continuous input x.
Create a scatter plot in ggplot2 to the show
relationship between y and x.
Manually assign the marker size to
4.
Add your code chunks here.
dfC %>% ggplot(aes(x = x, y = y)) +
geom_point(size = 4)
It is important to consider if the output to input relationship depends on the categorical variable When working with mixed categorical and continuous inputs. A simple way to explore such an effect is by coloring the markers in a scatter plot by the categorical variable.
Repeat the same scatter plot between y and
x that you created in 6b) but this time map the
color aesthetic to the v categorical
variable.
Does the output to input relationship appear to vary across
the levels (categories) of v?
HINT: Remember the importance of the the aes()
function!
Add your code chunks here.
dfC %>% ggplot(aes(x = x, y = y, color = v)) +
geom_point(size = 4)
Just looking at the scatter plot, only a weaker relationship can be
seen
Sometimes we need to include smoothers in our plots
to help guide our interpretation of the trends. The
geom_smooth() function allows adding a
smoothing layer on top of a scatter plot. You used this
function earlier in the semester and will use it now to help your
exploration.
Repeat the same scatter plot between y and
x that you created in 6c). You should thus continue to map
color to the v variable. This time however,
add in a geom_smooth() layer. Assign the
formula argument to y ~ x and assign the
method argument to lm (you do not need
quotes). You must also map the color aesthetic to
v within the geom_smooth() layer.
Does the output to input relationship appear to vary across
the levels (categories) of v?
HINT: Remember the importance of the aes()
function!
Add your code chunks here.
dfC %>% ggplot(aes(x = x, y = y, color = v)) +
geom_point(size = 4) +
geom_smooth(aes(color = v), method = "lm", formula = y ~ x, se = FALSE)
Yes, with the addition of smooth’s straight lines, the relationship
becomes easier to observe. They have different levels of v and
slope.
Let’s fit two simple models for this application. The first will use
linear additive features. Thus, you should add the effect of the
continuous input x to the categorical input
v.
Fit a non-Bayesian linear model to predict the response
y based on linear additive features between the two inputs.
Assign the model object to the modC_add object and display
the model summary() to the screen.
How many coefficients are estimated?
What does the Intercept correspond to?
Add your code chunks here.
modC_add <- lm(y ~ x + v, data = dfC)
modC_add
##
## Call:
## lm(formula = y ~ x + v, data = dfC)
##
## Coefficients:
## (Intercept) x vB vC vD
## -1.0249 -0.3562 1.7558 -0.2124 0.1857
There are 5 estimated coefficients. The intercept corresponds to the predicted value of y when v is A and x is equal to 0.
Next, let’s interact the categorical and continuous inputs!
Fit a non-Bayesian linear model to predict the response
y based on interacting the categorical input v
and the continuous input x. Your model should include all
main effects and interaction features. Assign the model object to the
modC_int object and display the model
summary() to the screen.
How many coefficients are estimated?
What does the intercept correspond to?
Add your code chunks here.
modC_int <- lm(y ~ x * v, data = dfC)
summary(modC_int)
##
## Call:
## lm(formula = y ~ x * v, data = dfC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6171 -0.8263 0.1105 0.7751 1.8525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.4748 0.2394 -1.984 0.051404 .
## x 1.0819 0.2740 3.948 0.000191 ***
## vB 1.4377 0.3829 3.755 0.000365 ***
## vC -0.2113 0.3493 -0.605 0.547224
## vD -0.3786 0.3409 -1.111 0.270633
## x:vB 0.1585 0.5043 0.314 0.754209
## x:vC -3.0104 0.3545 -8.492 3.13e-12 ***
## x:vD -1.2546 0.3686 -3.403 0.001127 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.054 on 67 degrees of freedom
## Multiple R-squared: 0.6643, Adjusted R-squared: 0.6293
## F-statistic: 18.94 on 7 and 67 DF, p-value: 1.085e-13
There are 8 estimated coefficients. The intercept corresponds to the predicted value of y when v is A and x is equal to 0.
Which of the two models, modC_add or
modC_int, are better?
Is your result consistent with the figure you created in 6d)?
Add code chunks here and what do you think?
summary(modC_add)
##
## Call:
## lm(formula = y ~ x + v, data = dfC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7640 -0.9464 0.0123 1.0740 3.0260
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0249 0.3327 -3.081 0.00295 **
## x -0.3562 0.2025 -1.759 0.08294 .
## vB 1.7558 0.5479 3.204 0.00204 **
## vC -0.2124 0.5088 -0.417 0.67769
## vD 0.1857 0.4956 0.375 0.70897
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.585 on 70 degrees of freedom
## Multiple R-squared: 0.2075, Adjusted R-squared: 0.1622
## F-statistic: 4.583 on 4 and 70 DF, p-value: 0.002408
summary(modC_int)
##
## Call:
## lm(formula = y ~ x * v, data = dfC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6171 -0.8263 0.1105 0.7751 1.8525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.4748 0.2394 -1.984 0.051404 .
## x 1.0819 0.2740 3.948 0.000191 ***
## vB 1.4377 0.3829 3.755 0.000365 ***
## vC -0.2113 0.3493 -0.605 0.547224
## vD -0.3786 0.3409 -1.111 0.270633
## x:vB 0.1585 0.5043 0.314 0.754209
## x:vC -3.0104 0.3545 -8.492 3.13e-12 ***
## x:vD -1.2546 0.3686 -3.403 0.001127 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.054 on 67 degrees of freedom
## Multiple R-squared: 0.6643, Adjusted R-squared: 0.6293
## F-statistic: 18.94 on 7 and 67 DF, p-value: 1.085e-13
coefplot(modC_add)
coefplot(modC_int)
AIC(modC_add)
## [1] 288.7518
AIC(modC_int)
## [1] 230.3214
According to the AIC and the R-squared, modC_int is better. This result is consistent with the 6d. The result of modC_int proves the interaction of v with x.